**August 26, 2019
**Code corrections to PDKM's do files by D.M. Gibler
**Note: The code that follows is the code used by PDKM to generate replication of the Braithwaite and Lemke piece.
**Any changes we made to the code are prefaced by "***dmg:..***".  All file calls were of course changed
**because we use our own directories. Comments that begin with only one asterisk (*) are in the original code.


* This file is a re-replication of Gibler, Miller, and Little's replication of Braithwaite and Lemke (2011). 
* This is the first of two Braithwaite and Lemke (BL) replication files.
* It focuses on replicating how Braithwaite and Lemke's dependent variables were created.
* We could not replicate Braithwaite and Lemke's coding exactly, but we found a method that produces more similar results than GML's.

clear
capture log close

log using "Braithwaite and Lemke Coding Replication.log", replace
set more off


****************************************************************************************************************************************

* COMPARE THE DISTRIBUTION OF BL AND GML'S DEPEDENT VARIABLES IN STAGE 2 OF THE REGRESSIONS

* This produces the numbers that are reported in the text.

use "B&L_CMPS.dta", clear

* BL sample
tab mutualforce if onset==1
tab fatal_250 if onset==1
tab war if onset==1

* GML sample
tab gml_mutualforce if gml_onset==1
tab gml_fatal_250 if gml_onset==1
tab gml_war if gml_onset==1


***********************************************************************************************************************************************
 

* REPLICATE HOW BL CREATE THEIR DYADS AND DEPENDENT VARIBLES, USING MID 3.0.

* Begin by figuring out which MIDs were bilateral on the first day.

use "MIDB 3.0.dta", clear
drop if orig==0
gen num_orig=1
collapse (sum) num_orig, by(dispnum)
save "temp2.dta", replace //save this information for later

* Create dyads from MID 3.0

use "MIDB 3.0.dta", clear
mvdecode dispnum-orig, mv(-9)
drop if sidea==0
ren * *a
ren dispnuma dispnum
save "temp.dta", replace

use "MIDB 3.0.dta", clear
mvdecode dispnum-orig, mv(-9)
drop if sidea==1
ren * *b
ren dispnumb dispnum

joinby dispnum using "temp.dta"

* Drop entire MID if not bilateral on day 1

merge m:1 dispnum using "temp2.dta" //file created above
drop if num_orig>2

* Drop joiner dyads - We are retaining MIDs that have joiners (unlike GML), but not retaining the joiner dyads themselves.

drop if origa==0 | origb==0

* Identify start year

egen year=rowmax(styeara styearb)

* Eliminate duplicates by collapsing

collapse (max) hostleva hostlevb fatalitya fatalityb (min) year, by(dispnum ccodea ccodeb)

*** dmg: we add a summarize here to diagnose the merge error; there are 2,159 disputes before the merge
summ dispnum 

* Merge in necessary variables from MIDA

merge 1:1 dispnum using "MIDA 3.0.dta", keepusing(fatality recip hostlev)

*** dmg: We add two summarize calls here to diagnose the merge error; there are now 2,324 disputes.
*** dmg: The merge added 165 disputes with missing fatality and hostility values.  Note that each of these 165 
*** dmg: cases has a merge value of 2 and missing hostlev and fatality values.
summ dispnum 
summ * if ccodea==.


* Code dependent variables

* We experimented with different ways of creating these variables.
* Our goal was to find the method that gave us the closest match to Braithwaite and Lemke in each case.

*** dmg: The experiment did not go well. As above, the original code inflated the number of WARS in the 
*** dmg: data dramatically. The data is fine after the collapse call, but then the merge adds a 165 cases 
*** dmg: with missing values.  Since B&L treated missing values as every DV, PDKM inflates the number of 
*** dmg: these DV onsets by 165 for four of the six dependent variables.  We correct their code by dropping
*** dmg: these errant cases with the next line to correct the mistake.  The next line does that.
drop if ccodea==.

gen onset_mid3=1

ren recip recip_mid3 //from MIDA
replace recip_mid3=0 if recip_mid3==.

gen force_mid3=(hostlev>3) //from MIDA
replace force_mid3=0 if hostlev==.

gen mut_force_mid3=(hostleva>3 & hostlevb>3) //from MIDB
replace mut_force_mid3=0 if hostleva==. | hostlevb==.

gen fatal_mid3=(fatalitya>0 | fatalityb>0)
// Following BL, we count missing fatalities as greater than 0.

gen high_fatal_mid3=(fatalitya>3 | fatalityb>3)
// Following BL, we count missing fatalities as greater than 250. This is not the best way to construct this variable, but it produces the best match with BL.

gen war_mid3=(fatality>5)
// Following BL, we count missing fatalities as greater than 1,000. This is not the best way to construct this variable, but it produces the best match with BL.

* There should only be one dyad per year in the BL data (although there appear to be a few extras), so we collapse again

gen ccodehigh=ccodea if ccodea>ccodeb
replace ccodehigh=ccodeb if ccodeb>ccodea
gen ccodelow=ccodea if ccodea<ccodeb
replace ccodelow=ccodeb if ccodeb<ccodea
gen dyadid=ccodelow*1000+ccodehigh

*** dmg: Our reconstruction of the IVs in the B&L data included no dispute variables. Later analyses use territorial
*** dmg: disputes, so we add <dispnum> to the following collapse line.
collapse (max) onset_mid3 recip_mid3 force_mid3 mut_force_mid3 fatal_mid3 high_fatal_mid3 war_mid3 dispnum, by (dyadid year)

save "BL Dyadic MIDs 3.0.dta", replace 

* Merge into BL original dataset and see how the variables compare.
* Note: This is the BL replication dataset from Gibler's dataverse, but we added the variables ccodea and ccodeb and dropped some estimated variables.

*** dmg: The following code (until our comment "resume here") is mostly irrelevant given the error of including
*** dmg: 165 cases with missing values.

use "B&L_CMPS.dta", clear

* B&L's dyadid variable is sometimes blank, so we fill it in.

gen ccodehigh=ccodea if ccodea>ccodeb
replace ccodehigh=ccodeb if ccodeb>ccodea
gen ccodelow=ccodea if ccodea<ccodeb
replace ccodelow=ccodeb if ccodeb<ccodea
replace dyadid=ccodelow*1000+ccodehigh

* Now merge

merge m:1 dyadid year using "BL Dyadic MIDs 3.0.dta"
drop if _merge==2
drop _merge
replace onset_mid3=0 if onset_mid3==.

* Check how close the matches are

tab onset onset_mid3
tab recipro recip_mid3
tab force force_mid3
tab fatal_0 fatal_mid3
tab fatal_250 high_fatal_mid3
tab war war_mid3

* They are never an exact match, but pretty close.
*** dmg: Resume here.


************************************************************************************************************************************************

* REPLICATE HOW GML CREATED THEIR ONSET VARIABLE

*** dmg: We note that we were never asked how we approached the replication despite our attempts to work with
*** dmg: Palmer, D'Orazio, and Kenwick.  Further, we had assurances from Braithwaite and Lemke that we replicated
*** dmg: their original study.  The assertions below about what we did are not true, and we again wish we could
*** dmg: have worked with them.

* Begin by figuring out which MIDs are ever multilateral.

*** dmg: We use the current version of the data instead.
import delimited "gml-mida-2.1.csv", clear
save "gml-mida.dta", replace

import delimited "gml-midb-2.1.csv", clear

*** dmg: Following this are some quick corrections for gml 2.1 data that had "NA" for South Korea add in MID#4455; 
*** dmg: there was an errant character in the web release of our data that forced Stata to read several variables 
*** dmg: as strings.

replace fatality="0" if fatality=="NA"
destring fatality, replace
replace sidea="1" if sidea=="NA"
destring sidea, replace 
replace orig="1" if orig=="NA"
destring orig, replace 
replace hostlev="1" if hostlev=="NA"
destring hostlev, replace
rename dispnum dispnum3 
save "gml-midb.dta", replace

use "gml-midb.dta", clear
gen num_partic=1
collapse (sum) num_partic, by(dispnum3)
save "temp2.dta", replace

* Create dyads from MID 3.0

use "gml-midb.dta", clear
drop if sidea==0
ren * *a
ren dispnum3a dispnum3
save "temp.dta", replace

use "gml-midb.dta", clear
drop if sidea==1
ren * *b
ren dispnum3b dispnum3

joinby dispnum3 using "temp.dta"

* Drop entire MID if not bilateral for its entire history - this is the major difference between BL's method and GML's method.

merge m:1 dispnum3 using "temp2.dta"
drop if num_partic>2

* Drop joiner dyads - GML also seem to do this slightly differently from BL, but it only changes the coding of one observation.

drop if origa==0 & origb==0

* Identify start year

egen year=rowmax(styeara styearb)

* Eliminate duplicates by collapsing

collapse (max) hostleva hostlevb fatalitya fatalityb (min) year, by(dispnum3 ccodea ccodeb)

* Code dependent variable of interest

gen rep_gml_onset=1

* There should only be one dyad per year in the BL data, so we collapse again

gen ccodehigh=ccodea if ccodea>ccodeb
replace ccodehigh=ccodeb if ccodeb>ccodea
gen ccodelow=ccodea if ccodea<ccodeb
replace ccodelow=ccodeb if ccodeb<ccodea
gen dyadid=ccodelow*1000+ccodehigh

collapse (max) rep_gml_onset, by (dyadid year)

save "BL Dyadic MIDs 3.0 with GML Onset Coding.dta", replace 

* Merge into BL dataset and see how the variables compare.

*** dmg: The <base data> is the corrected B&L data described in the other do file.  
use "base data.dta", clear

*** dmg: We create ccodea, ccodeb, and dyadid in the reconstructed data directly and comment out PDKM code.
gen ccodea=ccode1
gen ccodeb=ccode2
gen dyadid=max(ccode1, ccode2)*1000+min(ccode1, ccode2)

* B&L's dyad id variable is sometimes blank, so we fill it in.

*** dmg: (cut) gen ccodehigh=ccodea if ccodea>ccodeb
*** dmg: (cut) replace ccodehigh=ccodeb if ccodeb>ccodea
*** dmg: (cut) gen ccodelow=ccodea if ccodea<ccodeb
*** dmg: (cut) replace ccodelow=ccodeb if ccodeb<ccodea
*** dmg: (cut) replace dyadid=ccodelow*1000+ccodehigh

* Now merge

merge m:1 dyadid year using "BL Dyadic MIDs 3.0 with GML Onset Coding.dta"
drop if _merge==2
drop _merge
replace rep_gml_onset=0 if rep_gml_onset==.


*** dmg: Since we're using new data, and the PDKM coding method, the gml_onset variable is not in the reconstructed data.
* Check how close the matches are

*** dmg: (cut) tab gml_onset rep_gml_onset
* There are only two observations different, so we seem to have almost exactly replicated GML's method of coding onset.

*** dmg: (cut) tab gml_onset onset
* There are 368 differences GMLS's and BL's onset variable. Below we show that many of these differences are due to different coding methods, not differences in the underyling data.

*** dmg: No, this was not a replication of coding method, and, non, the differences in data are not due to coding method.

************************************************************************************************************************************


* CREATE ONSET VARIABLE AND OTHER DEPENDENT VARIABLES USING GML'S ORIGINAL DATA AND BL'S METHOD

* Begin by figuring out which MIDs were bilateral on the first day.

use "gml-midb.dta", clear
drop if orig==0
gen num_orig=1
collapse (sum) num_orig, by(dispnum3)
save "temp2.dta", replace

* Create dyads from MID 3.0

use "gml-midb.dta", clear
mvdecode dispnum3-orig, mv(-9)
drop if sidea==0
ren * *a
ren dispnum3a dispnum3
save "temp.dta", replace

use "gml-midb.dta", clear
mvdecode dispnum3-orig, mv(-9)
drop if sidea==1
ren * *b
ren dispnum3b dispnum3

joinby dispnum3 using "temp.dta"

* Drop entire MID if not bilateral on day 1

merge m:1 dispnum3 using "temp2.dta"
drop if num_orig>2

* Drop joiner dyads

drop if origa==0 | origb==0

* Identify start year

egen year=rowmax(styeara styearb)

* Eliminate duplicates by collapsing

collapse (max) hostleva hostlevb fatalitya fatalityb (min) year, by(dispnum3 ccodea ccodeb)

* Merge in necessary variables from MIDA

merge 1:1 dispnum3 using "gml-mida.dta", keepusing(fatality recip hostlev)

*** dmg: We again correct the error in the merge.
drop if ccodea==0

* Code dependent variables


gen onset_gml1=1 if _mer==3

ren recip recip_gml1
replace recip_gml1=0 if recip_gml1==.

gen force_gml1=(hostlev>3)
replace force_gml1=0 if hostlev==.

gen mut_force_gml1=(hostleva>3 & hostlevb>3)
replace mut_force_gml1=0 if hostleva==. | hostlevb==.

gen fatal_gml1=(fatalitya>0 | fatalityb>0)

gen high_fatal_gml1=(fatalitya>3 | fatalityb>3)

gen war_gml1=(fatality>5)

* There should only be one dyad per year in the BL data, so we collapse again

gen ccodehigh=ccodea if ccodea>ccodeb
replace ccodehigh=ccodeb if ccodeb>ccodea
gen ccodelow=ccodea if ccodea<ccodeb
replace ccodelow=ccodeb if ccodeb<ccodea
gen dyadid=ccodelow*1000+ccodehigh

*** dmg: Our reconstruction of the IVs in the B&L data included no dispute variables. Later analyses use territorial
*** dmg: disputes, so we add <dispnum> to the following collapse line.
collapse (max) onset_gml1 recip_gml1 force_gml1 mut_force_gml1 fatal_gml1 high_fatal_gml1 war_gml1 dispnum, by (dyadid year)

save "BL Dyadic MIDs GML ISQ.dta", replace 


* Merge into BL original dataset and see how the variables compare.

*** dmg: The <base data> is the corrected B&L data described in the other do file.  
use "base data.dta", clear

*** dmg: We create ccodea, ccodeb, and dyadid in the reconstructed data directly and comment out PDKM code.
gen ccodea=ccode1
gen ccodeb=ccode2
gen dyadid=max(ccode1, ccode2)*1000+min(ccode1, ccode2)

* B&L's dyad id variable is sometimes blank, so we fill it in.

gen ccodehigh=ccodea if ccodea>ccodeb
replace ccodehigh=ccodeb if ccodeb>ccodea
gen ccodelow=ccodea if ccodea<ccodeb
replace ccodelow=ccodeb if ccodeb<ccodea
replace dyadid=ccodelow*1000+ccodehigh

* Now merge

merge m:1 dyadid year using "BL Dyadic MIDs GML ISQ.dta"
drop if _merge==2
drop _merge
replace onset_gml1=0 if onset_gml1==.

tab onset onset_gml1
* Now there are only 264 differences with BL's onset variable.
* In other words, using a method of creating the onset variable that is more similar to BL's method produces a more similar coding, despite using the GML data.


********************************************************************************************************************************************

*** dmg: The following code is now a near duplicate of the section immediately above.  However, we keep it to 
*** dmg: demonstrate another significant coding error in PDKM's replication attempt that causes PDKM to seriously
*** dmg: undercount the dependent variables when using gml data.

* CREATE ONSET VARIABLE AND OTHER DEPENDENT VARIABLES USING GML'S 2.1 DATA AND BL'S METHOD

* In addition to replicating BL's coding method with the GML data used in the ISQ article, we also replicate it with the GML 2.1 data - the latest version.

* Begin by figuring out which MIDs were bilateral on the first day.

use "GML MIDB 2.1.dta", clear
drop if sidea=="NA" //There is just one observation in this dataset for which several variables have the value of NA.
destring orig, replace //The NA codings make many of the variables strings.
drop if orig==0
gen num_orig=1
collapse (sum) num_orig, by(dispnum)
save "temp2.dta", replace

* Create dyads from GML 2.1

use "GML MIDB 2.1.dta", clear
*** dmg: Note the following code sequence.  First, PDKM define missing values in this next line.
mvdecode dispnum-orig, mv(-9)
drop if sidea=="NA"

*** dmg: Then, PDKM destring the values of the four string variables.  This creates a problem because true 
*** dmg: missing values are not coded as such.  They're omitted from the analyses.  This is especially
*** dmg: important in this case because B&L labeled missing values as positive for onset in each DV.
destring orig, replace
destring sidea, replace
destring hostlev, replace
destring fatality, replace
drop if sidea==0
ren * *a
ren dispnuma dispnum
save "temp.dta", replace

use "GML MIDB 2.1.dta", clear
*** dmg: We see the same sequence of missing values declaration and then the destrings, which dramatically
*** dmg: undercounts positive DV values when using gml data.
mvdecode dispnum-orig, mv(-9)
drop if sidea=="NA"
destring orig, replace
destring sidea, replace
destring hostlev, replace
destring fatality, replace
drop if sidea==1
ren * *b
ren dispnumb dispnum

joinby dispnum using "temp.dta"

* Drop entire MID if not bilateral on day 1

merge m:1 dispnum using "temp2.dta"
drop if num_orig>2

* Drop joiner dyads

drop if origa==0 | origb==0

* Identify start year

egen year=rowmax(styeara styearb)

* Eliminate duplicates by collapsing

collapse (max) hostleva hostlevb fatalitya fatalityb (min) year, by(dispnum ccodea ccodeb)

* Merge in necessary variables from MIDA

ren dispnum dispnum3 //This variable has different names across GML MIDA 2.1 and GML MIDB 2.1.
merge 1:1 dispnum3 using "GML MIDA 2.1.dta", keepusing(fatality recip hostlev)

*** dmg: Correcting the same error a third time.
drop if ccodea==.

* Code dependent variables

gen onset_gml2=1

ren recip recip_gml2
replace recip_gml2=0 if recip_gml2==.

gen force_gml2=(hostlev>3)
replace force_gml2=0 if hostlev==.

gen mut_force_gml2=(hostleva>3 & hostlevb>3)
replace mut_force_gml2=0 if hostleva==. | hostlevb==.

gen fatal_gml2=(fatalitya>0 | fatalityb>0)

gen high_fatal_gml2=(fatalitya>3 | fatalityb>3)

gen war_gml2=(fatality>5)

* There should only be dyad per year in the BL data, so we collapse again

gen ccodehigh=ccodea if ccodea>ccodeb
replace ccodehigh=ccodeb if ccodeb>ccodea
gen ccodelow=ccodea if ccodea<ccodeb
replace ccodelow=ccodeb if ccodeb<ccodea
gen dyadid=ccodelow*1000+ccodehigh

collapse (max) onset_gml2 recip_gml2 force_gml2 mut_force_gml2 fatal_gml2 high_fatal_gml2 war_gml2, by (dyadid year)

*** dmg: We conduct no further analyses with the errant code. It was kept just to show the missing value/destring sequence.
*** dmg: (cut) save "BL Dyadic MIDs GML 2.1.dta", replace 
* This will be used for later analysis.


********************************************************************************************************************************************


* CREATE ONSET VARIABLE AND OTHER DEPENDENT VARIABLES USING THE MID 4.3 DATA AND BL'S METHOD

* Begin by figuring out which MIDs were bilateral on the first day.

use "MIDB 4.3.dta", clear
drop if orig==0
gen num_orig=1
collapse (sum) num_orig, by(dispnum3)
save "temp2.dta", replace

* Create dyads from MID 3.0

use "MIDB 4.3.dta", clear
mvdecode dispnum3-orig, mv(-9)
drop if sidea==0
ren * *a
ren dispnum3a dispnum3
save "temp.dta", replace

use "MIDB 4.3.dta", clear
mvdecode dispnum3-orig, mv(-9)
drop if sidea==1
ren * *b
ren dispnum3b dispnum3

joinby dispnum3 using "temp.dta"

* Drop entire MID if not bilateral on day 1

merge m:1 dispnum3 using "temp2.dta"
drop if num_orig>2

* Drop joiner dyads

drop if origa==0 | origb==0

* Identify start year

egen year=rowmax(styeara styearb)

* Eliminate duplicates by collapsing

collapse (max) hostleva hostlevb fatalitya fatalityb (min) year, by(dispnum3 ccodea ccodeb)

* Merge in necessary variables from MIDA

merge 1:1 dispnum3 using "MIDA 4.3.dta", keepusing(fatality recip hostlev)

*** dmg: Fourth time.
drop if ccodea==.

* Code dependent variables

gen onset_mid4=1

ren recip recip_mid4
replace recip_mid4=0 if recip_mid4==.

gen force_mid4=(hostlev>3)
replace force_mid4=0 if hostlev==.

gen mut_force_mid4=(hostleva>3 & hostlevb>3)
replace mut_force_mid4=0 if hostleva==. | hostlevb==.

gen fatal_mid4=(fatalitya>0 | fatalityb>0)

gen high_fatal_mid4=(fatalitya>3 | fatalityb>3)

gen war_mid4=(fatality>5)

* There should only be one dyad per year in the BL data, so we collapse again

gen ccodehigh=ccodea if ccodea>ccodeb
replace ccodehigh=ccodeb if ccodeb>ccodea
gen ccodelow=ccodea if ccodea<ccodeb
replace ccodelow=ccodeb if ccodeb<ccodea
gen dyadid=ccodelow*1000+ccodehigh

collapse (max) onset_mid4 recip_mid4 force_mid4 mut_force_mid4 fatal_mid4 high_fatal_mid4 war_mid4 dispnum, by (dyadid year)

save "BL Dyadic MIDs 4.3.dta", replace 
* This will be used for later analysis.
